Skip to content

otel: subchannel metrics #12202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

AgraVator
Copy link
Contributor

@AgraVator AgraVator commented Jul 7, 2025

Implements A94

@AgraVator AgraVator marked this pull request as ready for review July 9, 2025 10:05
@AgraVator AgraVator force-pushed the otel-subchannel-metrics branch from eabbf58 to 9595507 Compare July 15, 2025 10:00
Comment on lines 629 to 635
subchannelMetrics.recordConnectionAttemptFailed(buildLabelSet(
getAttributeOrDefault(
addressIndex.getCurrentEagAttributes(), NameResolver.ATTR_BACKEND_SERVICE),
getAttributeOrDefault(
addressIndex.getCurrentEagAttributes(), LoadBalancer.ATTR_LOCALITY_NAME),
null, null
));
Copy link
Contributor Author

@AgraVator AgraVator Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be triggered from here or before addressIndex.increment() at 658 ?

@AgraVator AgraVator force-pushed the otel-subchannel-metrics branch from 9595507 to c713561 Compare July 22, 2025 17:55
@@ -415,7 +415,7 @@ void exitIdleMode() {
LbHelperImpl lbHelper = new LbHelperImpl();
lbHelper.lb = loadBalancerFactory.newLoadBalancer(lbHelper);
// Delay setting lbHelper until fully initialized, since loadBalancerFactory is user code and
// may throw. We don't want to confuse our state, even if we will enter panic mode.
// may throw. We don't want to confuse our state, even if we enter panic mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is more accurate as it is. After entering panic mode there is nothing much we can do about state, the comment implies delaying entering that panic mode and maintaining a sane state for the channel for as long as possible before bringing in potential user code.

addressIndex.getCurrentEagAttributes(), NameResolver.ATTR_BACKEND_SERVICE),
getAttributeOrDefault(
addressIndex.getCurrentEagAttributes(), LoadBalancer.ATTR_LOCALITY_NAME),
"Peer Pressure",
Copy link
Contributor

@kannanjgithub kannanjgithub Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not in the gRFC? Instead there is a "List of allowed values for grpc.disconnect_error".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Phase 1 we won't be plumbing disconnect_error, will raise another PR with this as the base branch for the same

@@ -593,6 +602,15 @@ public void run() {
pendingTransport = null;
connectedAddressAttributes = addressIndex.getCurrentEagAttributes();
gotoNonErrorState(READY);
subchannelMetrics.recordConnectionAttemptSucceeded(buildLabelSet(
Copy link
Contributor

@kannanjgithub kannanjgithub Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label for target? Also for disconnections and connection attempt failures below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildLabelSet() gets it from the class var

* @param optionalLabelValues the optional label values for the metric.
*/
@Override
public void addLongUpDownCounter(LongUpDownCounterMetricInstrument metricInstrument, long value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just wrappers on the underlying OTel API for UpDownCounter...

@@ -117,6 +119,22 @@ public void addLongCounter(LongCounterMetricInstrument metricInstrument, long va
counter.add(value, attributes);
}

@Override
public void addLongUpDownCounter(LongUpDownCounterMetricInstrument metricInstrument, long value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit test.

* @return the newly created LongUpDownCounterMetricInstrument
* @throws IllegalStateException if a metric with the same name already exists
*/
public LongUpDownCounterMetricInstrument registerLongUpDownCounter(String name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants